The flash sorting process produces binary files. The purpose of this notebook is to demonstrate how to read and display data from these files. In practice, these files are no more difficult to work with than columns of ASCII data, because modern access libraries automate the annoying and tedious parts of traversing the file's internal hierarchy.
The flash sorting analysis produces three kinds of files.
Additional data tables and gridded fields can be added to accomodate, for instance, VLF/LF stroke detection data, and an associated LMA flash_id
.
In [1]:
%%bash
RESULTS="/data/GLM-wkshp/flashsort/results/"
ls $RESULTS
echo "-----"
ls $RESULTS/h5_files/2009/Apr/10/
echo "-----"
ls $RESULTS/grid_files/2009/Apr/10/
echo "-----"
cat $RESULTS/h5_files/2009/Apr/10/input_params.py
In [2]:
import os
import tables, pandas
import numpy as np
results_dir = "/data/GLM-wkshp/flashsort/results/"
h5name = os.path.join(results_dir, "h5_files/2009/Apr/10/LYLOUT_090410_180000_3600.dat.flash.h5")
h5 = tables.open_file(h5name)
print h5
The file's internal hierarchy of groups and data tables is easily navigated by tab-completion. In the next cell, type
h5.root.
and then tab, and you'll get a list of group and table names.
Let's look at events first. tab-complete to that group and then tab-complete again to get a reference to the data table. Note that a flash_id column has been added to the original source data.
In [3]:
h5.root.
In [4]:
event_table = h5.root.events.LMA_090410_180000_3600
event_table
Out[4]:
We can get all the altitudes ...
In [5]:
event_table.cols.alt[:]
Out[5]:
Or the first ten sources ...
In [6]:
event_table[:10]
Out[6]:
... or the flash IDs for the first hundred sources.
In [7]:
event_table[:100]['flash_id']
Out[7]:
Let's look at flash #27. The indexing operation (colon) is what actually lets us access the data, instead of just the description of the data column.
In [35]:
print event_table.cols.flash_id
query = (event_table.cols.flash_id[:] == 27)
fl_27_events = event_table[query][:]
lats, lons, alts, times = fl_27_events['lat'], fl_27_events['lon'], fl_27_events['alt'], fl_27_events['time']
for lat, lon in zip(lats, lons):
print lat, lon
In [36]:
import matplotlib
matplotlib.rc('font', size=12)
%matplotlib inline
import matplotlib.pyplot as plt
plt.scatter(lons, lats, c=times)
plt_range = (lons.min(),lons.max(),lats.min(),lats.max())
print plt_range
plt.axis(plt_range)
Out[36]:
Now, let's get the associated flash properties. Use pandas to make an appealing print of the the data table.
In [13]:
flash_table = h5.root.flashes.LMA_090410_180000_3600
fl_27 = flash_table[flash_table.cols.flash_id[:] == 27] # This is a numpy array with a named dtype
fl_27 = pandas.DataFrame(fl_27)
In [14]:
fl_27
Out[14]:
In [15]:
fl_27_events = pandas.DataFrame(fl_27_events)
fl_27_events
Out[15]:
More sophisticated queries with the HDF5 data (and pandas) are possible.
In [16]:
import netCDF4 as ncdf
In [17]:
nc_name = os.path.join(results_dir, "grid_files/2009/Apr/10/NALMA_20090410_180000_3600_10src_0.0109deg-dx_flash_extent.nc")
nc = ncdf.Dataset(nc_name)
print nc
print nc.variables['crs']
In [20]:
from lmatools.multiples_nc import centers_to_edges
matplotlib.rc('xtick', labelsize=10)
matplotlib.rc('ytick', labelsize=10)
lon, lat, fl_dens = nc.variables['longitude'][:], nc.variables['latitude'][:], nc.variables['flash_extent'][:]
lon_edge, lat_edge = centers_to_edges(lon), centers_to_edges(lat)
In [21]:
time_index = 0
plt.pcolormesh(lon_edge, lat_edge, fl_dens[time_index,:,:], cmap='gray_r')
cbar = plt.colorbar()
In [27]:
import glob
from lmatools.grid_collection import LMAgridFileCollection
nc_filenames=glob.glob(os.path.join(results_dir, "grid_files/2009/Apr/10/*1[89]00*_flash_extent.nc"))
print nc_filenames
nc_field = 'flash_extent'
NCs = LMAgridFileCollection(nc_filenames, nc_field, x_name='longitude', y_name='latitude')
In [28]:
from datetime import datetime
t = datetime(2009,4,10,18,30,00)
xedge, yedge, data = NCs.data_for_time(t)
limits = xedge.min(), xedge.max(), yedge.min(), yedge.max()
In [29]:
fig = plt.figure(figsize=(12,12))
if False: # set this to True to actually write files.
for t, lon, lat, data in NCs:
ax = plt.subplot(111)
mesh = ax.pcolormesh(lon, lat, np.log10(data), vmin=0, vmax=3, cmap='gray_r')
ax.axis(limits)
plt.colorbar(mesh)
outfile = "{0}_{1}.png".format(t.isoformat(), nc_field)
save_path = os.path.join(results_dir, outfile)
print(save_path)
fig.savefig(save_path)
fig.clf()
In [30]:
from IPython.html.widgets import interactive
from IPython.display import display
In [32]:
from matplotlib.colors import LogNorm
def plot_for_frame(frame=0):
# get the data
t = NCs.times[frame]
xedge, yedge, data = NCs.data_for_time(t)
# plot a frame of data
fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(111)
mesh = ax.pcolormesh(xedge, yedge, data,
vmin=1, vmax=100, norm=LogNorm(), cmap='gray_r')
ax.axis(limits)
plt.colorbar(mesh, ax=ax)
title = "{1} at {0}".format(t.isoformat(), nc_field)
ax.set_title(title)
return fig
N_frames = len(NCs.times)
w = interactive(plot_for_frame, frame=(0, N_frames-1))
display(w)
In [25]:
In [ ]:
In [ ]:
In [ ]:
In [ ]: